orders (3.4m rows, 206k users): * order_id:
order identifier * user_id: customer identifier *
eval_set: which evaluation set this order belongs in (see
SET described below) * order_number: the order
sequence number for this user (1 = first, n = nth) *
order_dow: the day of the week the order was placed on *
order_hour_of_day: the hour of the day the order was placed
on * days_since_prior: days since the last order, capped at
30 (with NAs for order_number = 1)
products (50k rows): * product_id: product
identifier * product_name: name of the product *
aisle_id: foreign key * department_id: foreign
key
aisles (134 rows): * aisle_id: aisle
identifier * aisle: the name of the aisle
deptartments (21 rows): * department_id:
department identifier * department: the name of the
department
order_products__SET (30m+ rows): *
order_id: foreign key * product_id: foreign
key * add_to_cart_order: order in which each product was
added to cart * reordered: 1 if this product has been
ordered by this user in the past, 0 otherwise
where SET is one of the four following evaluation sets
(eval_set in orders): * "prior":
orders prior to that users most recent order (~3.2m orders) *
"train": training data supplied to participants (~131k
orders) * "test": test data reserved for machine learning
competitions (~75k orders)
## [1] 0
| aisle_id | aisle |
|---|---|
| 1 | prepared soups salads |
| 2 | specialty cheeses |
| 3 | energy granola bars |
| 4 | instant foods |
| 5 | marinades meat preparation |
| 6 | other |
| 7 | packaged meat |
| 8 | bakery desserts |
| 9 | pasta sauce |
| 10 | kitchen supplies |
| 11 | cold flu allergy |
| 12 | fresh pasta |
| 13 | prepared meals |
| 14 | tofu meat alternatives |
| 15 | packaged seafood |
| 16 | fresh herbs |
| 17 | baking ingredients |
| 18 | bulk dried fruits vegetables |
| 19 | oils vinegars |
| 20 | oral hygiene |
| 21 | packaged cheese |
| 22 | hair care |
| 23 | popcorn jerky |
| 24 | fresh fruits |
| 25 | soap |
| 26 | coffee |
| 27 | beers coolers |
| 28 | red wines |
| 29 | honeys syrups nectars |
| 30 | latino foods |
| 31 | refrigerated |
| 32 | packaged produce |
| 33 | kosher foods |
| 34 | frozen meat seafood |
| 35 | poultry counter |
| 36 | butter |
| 37 | ice cream ice |
| 38 | frozen meals |
| 39 | seafood counter |
| 40 | dog food care |
| 41 | cat food care |
| 42 | frozen vegan vegetarian |
| 43 | buns rolls |
| 44 | eye ear care |
| 45 | candy chocolate |
| 46 | mint gum |
| 47 | vitamins supplements |
| 48 | breakfast bars pastries |
| 49 | packaged poultry |
| 50 | fruit vegetable snacks |
| 51 | preserved dips spreads |
| 52 | frozen breakfast |
| 53 | cream |
| 54 | paper goods |
| 55 | shave needs |
| 56 | diapers wipes |
| 57 | granola |
| 58 | frozen breads doughs |
| 59 | canned meals beans |
| 60 | trash bags liners |
| 61 | cookies cakes |
| 62 | white wines |
| 63 | grains rice dried goods |
| 64 | energy sports drinks |
| 65 | protein meal replacements |
| 66 | asian foods |
| 67 | fresh dips tapenades |
| 68 | bulk grains rice dried goods |
| 69 | soup broth bouillon |
| 70 | digestion |
| 71 | refrigerated pudding desserts |
| 72 | condiments |
| 73 | facial care |
| 74 | dish detergents |
| 75 | laundry |
| 76 | indian foods |
| 77 | soft drinks |
| 78 | crackers |
| 79 | frozen pizza |
| 80 | deodorants |
| 81 | canned jarred vegetables |
| 82 | baby accessories |
| 83 | fresh vegetables |
| 84 | milk |
| 85 | food storage |
| 86 | eggs |
| 87 | more household |
| 88 | spreads |
| 89 | salad dressing toppings |
| 90 | cocoa drink mixes |
| 91 | soy lactosefree |
| 92 | baby food formula |
| 93 | breakfast bakery |
| 94 | tea |
| 95 | canned meat seafood |
| 96 | lunch meat |
| 97 | baking supplies decor |
| 98 | juice nectars |
| 99 | canned fruit applesauce |
| 100 | missing |
| 101 | air fresheners candles |
| 102 | baby bath body care |
| 103 | ice cream toppings |
| 104 | spices seasonings |
| 105 | doughs gelatins bake mixes |
| 106 | hot dogs bacon sausage |
| 107 | chips pretzels |
| 108 | other creams cheeses |
| 109 | skin care |
| 110 | pickled goods olives |
| 111 | plates bowls cups flatware |
| 112 | bread |
| 113 | frozen juice |
| 114 | cleaning products |
| 115 | water seltzer sparkling water |
| 116 | frozen produce |
| 117 | nuts seeds dried fruit |
| 118 | first aid |
| 119 | frozen dessert |
| 120 | yogurt |
| 121 | cereal |
| 122 | meat counter |
| 123 | packaged vegetables fruits |
| 124 | spirits |
| 125 | trail mix snack mix |
| 126 | feminine care |
| 127 | body lotions soap |
| 128 | tortillas flat bread |
| 129 | frozen appetizers sides |
| 130 | hot cereal pancake mixes |
| 131 | dry pasta |
| 132 | beauty |
| 133 | muscles joints pain relief |
| 134 | specialty wines champagnes |
## [1] 0
| department_id | department |
|---|---|
| 1 | frozen |
| 2 | other |
| 3 | bakery |
| 4 | produce |
| 5 | alcohol |
| 6 | international |
| 7 | beverages |
| 8 | pets |
| 9 | dry goods pasta |
| 10 | bulk |
| 11 | personal care |
| 12 | meat seafood |
| 13 | pantry |
| 14 | breakfast |
| 15 | canned goods |
| 16 | dairy eggs |
| 17 | household |
| 18 | babies |
| 19 | snacks |
| 20 | deli |
| 21 | missing |
## [1] 0
| product_id | product_name | aisle_id | department_id |
|---|---|---|---|
| 1 | Chocolate Sandwich Cookies | 61 | 19 |
| 2 | All-Seasons Salt | 104 | 13 |
| 3 | Robust Golden Unsweetened Oolong Tea | 94 | 7 |
| 4 | Smart Ones Classic Favorites Mini Rigatoni With Vodka Cream Sauce | 38 | 1 |
| 5 | Green Chile Anytime Sauce | 5 | 13 |
| 6 | Dry Nose Oil | 11 | 11 |
| 7 | Pure Coconut Water With Orange | 98 | 7 |
| 8 | Cut Russet Potatoes Steam N’ Mash | 116 | 1 |
| 9 | Light Strawberry Blueberry Yogurt | 120 | 16 |
| 10 | Sparkling Orange Juice & Prickly Pear Beverage | 115 | 7 |
| 11 | Peach Mango Juice | 31 | 7 |
| 12 | Chocolate Fudge Layer Cake | 119 | 1 |
| 13 | Saline Nasal Mist | 11 | 11 |
| 14 | Fresh Scent Dishwasher Cleaner | 74 | 17 |
| 15 | Overnight Diapers Size 6 | 56 | 18 |
| 16 | Mint Chocolate Flavored Syrup | 103 | 19 |
| 17 | Rendered Duck Fat | 35 | 12 |
| 18 | Pizza for One Suprema Frozen Pizza | 79 | 1 |
| 19 | Gluten Free Quinoa Three Cheese & Mushroom Blend | 63 | 9 |
| 20 | Pomegranate Cranberry & Aloe Vera Enrich Drink | 98 | 7 |
| 21 | Small & Medium Dental Dog Treats | 40 | 8 |
| 22 | Fresh Breath Oral Rinse Mild Mint | 20 | 11 |
| 23 | Organic Turkey Burgers | 49 | 12 |
| 24 | Tri-Vi-Sol® Vitamins A-C-and D Supplement Drops for Infants | 47 | 11 |
| 25 | Salted Caramel Lean Protein & Fiber Bar | 3 | 19 |
| 26 | Fancy Feast Trout Feast Flaked Wet Cat Food | 41 | 8 |
| 27 | Complete Spring Water Foaming Antibacterial Hand Wash | 127 | 11 |
| 28 | Wheat Chex Cereal | 121 | 14 |
| 29 | Fresh Cut Golden Sweet No Salt Added Whole Kernel Corn | 81 | 15 |
| 30 | Three Cheese Ziti, Marinara with Meatballs | 38 | 1 |
| 31 | White Pearl Onions | 123 | 4 |
| 32 | Nacho Cheese White Bean Chips | 107 | 19 |
| 33 | Organic Spaghetti Style Pasta | 131 | 9 |
| 34 | Peanut Butter Cereal | 121 | 14 |
| 35 | Italian Herb Porcini Mushrooms Chicken Sausage | 106 | 12 |
| 36 | Traditional Lasagna with Meat Sauce Savory Italian Recipes | 38 | 1 |
| 37 | Noodle Soup Mix With Chicken Broth | 69 | 15 |
| 38 | Ultra Antibacterial Dish Liquid | 100 | 21 |
| 39 | Daily Tangerine Citrus Flavored Beverage | 64 | 7 |
| 40 | Beef Hot Links Beef Smoked Sausage With Chile Peppers | 106 | 12 |
| 41 | Organic Sourdough Einkorn Crackers Rosemary | 78 | 19 |
| 42 | Biotin 1000 mcg | 47 | 11 |
| 43 | Organic Clementines | 123 | 4 |
| 44 | Sparkling Raspberry Seltzer | 115 | 7 |
| 45 | European Cucumber | 83 | 4 |
| 46 | Raisin Cinnamon Bagels 5 count | 58 | 1 |
| 47 | Onion Flavor Organic Roasted Seaweed Snack | 66 | 6 |
| 48 | School Glue, Washable, No Run | 87 | 17 |
| 49 | Vegetarian Grain Meat Sausages Italian - 4 CT | 14 | 20 |
| 50 | Pumpkin Muffin Mix | 105 | 13 |
## [1] 0
| order_id | product_id | add_to_cart_order | reordered |
|---|---|---|---|
| 1 | 49302 | 1 | 1 |
| 1 | 11109 | 2 | 1 |
| 1 | 10246 | 3 | 0 |
| 1 | 49683 | 4 | 0 |
| 1 | 43633 | 5 | 1 |
| 1 | 13176 | 6 | 0 |
| 1 | 47209 | 7 | 0 |
| 1 | 22035 | 8 | 1 |
| 36 | 39612 | 1 | 0 |
| 36 | 19660 | 2 | 1 |
| 36 | 49235 | 3 | 0 |
| 36 | 43086 | 4 | 1 |
| 36 | 46620 | 5 | 1 |
| 36 | 34497 | 6 | 1 |
| 36 | 48679 | 7 | 1 |
| 36 | 46979 | 8 | 1 |
| 38 | 11913 | 1 | 0 |
| 38 | 18159 | 2 | 0 |
| 38 | 4461 | 3 | 0 |
| 38 | 21616 | 4 | 1 |
| 38 | 23622 | 5 | 0 |
| 38 | 32433 | 6 | 0 |
| 38 | 28842 | 7 | 0 |
| 38 | 42625 | 8 | 0 |
| 38 | 39693 | 9 | 0 |
| 96 | 20574 | 1 | 1 |
| 96 | 30391 | 2 | 0 |
| 96 | 40706 | 3 | 1 |
| 96 | 25610 | 4 | 0 |
| 96 | 27966 | 5 | 1 |
| 96 | 24489 | 6 | 1 |
| 96 | 39275 | 7 | 1 |
| 98 | 8859 | 1 | 1 |
| 98 | 19731 | 2 | 1 |
| 98 | 43654 | 3 | 1 |
| 98 | 13176 | 4 | 1 |
| 98 | 4357 | 5 | 1 |
| 98 | 37664 | 6 | 1 |
| 98 | 34065 | 7 | 1 |
| 98 | 35951 | 8 | 1 |
| 98 | 43560 | 9 | 1 |
| 98 | 9896 | 10 | 1 |
| 98 | 27509 | 11 | 1 |
| 98 | 15455 | 12 | 1 |
| 98 | 27966 | 13 | 1 |
| 98 | 47601 | 14 | 1 |
| 98 | 40396 | 15 | 1 |
| 98 | 35042 | 16 | 1 |
| 98 | 40986 | 17 | 1 |
| 98 | 1939 | 18 | 1 |
## [1] 0
| order_id | product_id | add_to_cart_order | reordered |
|---|---|---|---|
| 2 | 33120 | 1 | 1 |
| 2 | 28985 | 2 | 1 |
| 2 | 9327 | 3 | 0 |
| 2 | 45918 | 4 | 1 |
| 2 | 30035 | 5 | 0 |
| 2 | 17794 | 6 | 1 |
| 2 | 40141 | 7 | 1 |
| 2 | 1819 | 8 | 1 |
| 2 | 43668 | 9 | 0 |
| 3 | 33754 | 1 | 1 |
| 3 | 24838 | 2 | 1 |
| 3 | 17704 | 3 | 1 |
| 3 | 21903 | 4 | 1 |
| 3 | 17668 | 5 | 1 |
| 3 | 46667 | 6 | 1 |
| 3 | 17461 | 7 | 1 |
| 3 | 32665 | 8 | 1 |
| 4 | 46842 | 1 | 0 |
| 4 | 26434 | 2 | 1 |
| 4 | 39758 | 3 | 1 |
| 4 | 27761 | 4 | 1 |
| 4 | 10054 | 5 | 1 |
| 4 | 21351 | 6 | 1 |
| 4 | 22598 | 7 | 1 |
| 4 | 34862 | 8 | 1 |
| 4 | 40285 | 9 | 1 |
| 4 | 17616 | 10 | 1 |
| 4 | 25146 | 11 | 1 |
| 4 | 32645 | 12 | 1 |
| 4 | 41276 | 13 | 1 |
| 5 | 13176 | 1 | 1 |
| 5 | 15005 | 2 | 1 |
| 5 | 47329 | 3 | 1 |
| 5 | 27966 | 4 | 1 |
| 5 | 23909 | 5 | 1 |
| 5 | 48370 | 6 | 1 |
| 5 | 13245 | 7 | 1 |
| 5 | 9633 | 8 | 1 |
| 5 | 27360 | 9 | 1 |
| 5 | 6348 | 10 | 1 |
| 5 | 40878 | 11 | 1 |
| 5 | 6184 | 12 | 1 |
| 5 | 48002 | 13 | 1 |
| 5 | 20914 | 14 | 1 |
| 5 | 37011 | 15 | 1 |
| 5 | 12962 | 16 | 1 |
| 5 | 45698 | 17 | 1 |
| 5 | 24773 | 18 | 1 |
| 5 | 18569 | 19 | 1 |
| 5 | 41176 | 20 | 1 |
## [1] 206209
## [1] 206209
We can observe on the first chart
days_since_prior_order
that most of the users have a higher probability to do another purchase
order after a week from the previous purchase. Also, we can visualize on
the graph oder_dow that the most frequent days of ordering
are Sunday’s and Monday’s comparing to the rest of the week, and on the
last chart order_hour_of_day,we note a high demand of
orders between 9am to 6pm.
## > memory.size()
## [1] 7068.48
## > memory.size(TRUE)
## [1] 7416.12
## > memory.limit()
## [1] 8109
## [1] 56000
####Mandy comment -> Still keeping top 10 reordered products, top 10 reordered product by aisle. Just to see the overall.
Top 10 reordered products
As the result from the table, we can see that fresh fruits and packaged vegetables fruits under produce department are the most reordered products.| product_id | product_name | aisle | department | total_reorder | total_order | percentage_reorder |
|---|---|---|---|---|---|---|
| 24852 | Banana | fresh fruits | produce | 415166 | 491291 | 84.50511 |
| 13176 | Bag of Organic Bananas | fresh fruits | produce | 329275 | 394930 | 83.37553 |
| 21137 | Organic Strawberries | fresh fruits | produce | 214448 | 275577 | 77.81781 |
| 21903 | Organic Baby Spinach | packaged vegetables fruits | produce | 194939 | 251705 | 77.44741 |
| 47209 | Organic Hass Avocado | fresh fruits | produce | 176173 | 220877 | 79.76068 |
| 47766 | Organic Avocado | fresh fruits | produce | 140270 | 184224 | 76.14100 |
| 27845 | Organic Whole Milk | milk | dairy eggs | 118684 | 142813 | 83.10448 |
| 47626 | Large Lemon | fresh fruits | produce | 112178 | 160792 | 69.76591 |
| 27966 | Organic Raspberries | packaged vegetables fruits | produce | 109688 | 142603 | 76.91844 |
| 16797 | Strawberries | fresh fruits | produce | 104588 | 149445 | 69.98428 |
Overall, approximately 60% of the total orders are reordered products.
Top 10 reordered products by aisle
| aisle | department | total_reorder | total_order | percentage_reorder |
|---|---|---|---|---|
| fresh fruits | produce | 2726251 | 3792661 | 71.88227 |
| fresh vegetables | produce | 2123540 | 3568630 | 59.50575 |
| packaged vegetables fruits | produce | 1178700 | 1843806 | 63.92755 |
| yogurt | dairy eggs | 1034957 | 1507583 | 68.65008 |
| milk | dairy eggs | 722128 | 923659 | 78.18123 |
| water seltzer sparkling water | beverages | 640988 | 878150 | 72.99300 |
| packaged cheese | dairy eggs | 598280 | 1021462 | 58.57095 |
| soy lactosefree | dairy eggs | 460069 | 664493 | 69.23609 |
| chips pretzels | snacks | 444036 | 753739 | 58.91111 |
| bread | bakery | 408010 | 608469 | 67.05518 |
| department | total_reorder | total_order | percentage_reorder |
|---|---|---|---|
| produce | 6432596 | 9888378 | 65.05208 |
| dairy eggs | 3773723 | 5631067 | 67.01613 |
| beverages | 1832952 | 2804175 | 65.36511 |
| snacks | 1727075 | 3006412 | 57.44638 |
| frozen | 1268058 | 2336858 | 54.26337 |
| bakery | 769880 | 1225181 | 62.83806 |
| pantry | 679799 | 1956819 | 34.74000 |
| deli | 666231 | 1095540 | 60.81302 |
| canned goods | 511317 | 1114857 | 45.86391 |
| meat seafood | 420349 | 739238 | 56.86247 |
Sales Patterns
########Mandy Add pca in eda slide 23 -> what variable important
to the outcome
Summary from TA -Supervised and unsupervised can
forecast reordered products -Supervised = Tree SVM -Unsupervised =
clustering PCA
Our output: 1. To forecast how many days are there between the prior orders and the recorded orders –> Clustering, PCA 2. Reordered products 3. Association between products: PCA and clustering in class